MCN: Modulated Convolutional Network
45
Algorithm 1 MCN training. L is the loss function, Q is the reconstructed filter, λ1 and λ2
are decay factors, and N is the number of layers. Update() updates the parameters based
on our update scheme.
Input: a minibatch of inputs and their labels, unbinarized filters C, modulation filters M,
learning rates η1 and η2, corresponding to C and M, respectively.
Output: updated unbinarized filters Ct+1, updated modulation filters M t+1, and updated
learning rates ηt+1
1
and ηt+1
2
.
1: {1. Computing gradients with aspect to the parameters:}
2: {1.1. Forward propagation:}
3: for k =1 to N do
4:
ˆC ←Binarize(C)
5:
Computing Q via Eq. 3.13 ∼3.14
6:
Convolutional features calculation using Eq. 3.15 ∼3.17
7: end for
8: {1.2. Backward propagation:}
9: {Note that the gradients are not binary.}
10: Computing δQ = ∂L
∂Q
11: for k =N to 1 do
12:
Computing δ ˆ
C using Eq. 3.20, Eq. 3.22 ∼3.23
13:
Computing δM using Eq. 3.24, Eq. 3.26 ∼3.27
14: end for
15: {Accumulating the parameters gradients:}
16: for k = 1 to N do
17:
Ct+1 ←Update(δ ˆ
C, η1) (using Eq. 3.21)
18:
M t+1 ←Update(δM, η2) (using Eq. 3.25)
19:
ηt+1
1
←λ1η1
20:
ηt+1
2
←λ2η2
21: end for
where η2 is the learning rate. Furthermore, we have the following.
∂LS
∂M = ∂LS
∂Q · ∂Q
∂M =
i,j
∂LS
∂Qij
· ˆCi,
(3.26)
Based on Eq. 3.18 and we have:
∂LM
∂M = −θ
i,j
(Ci −ˆCi ◦Mj) · ˆCi.
(3.27)
Details about the derivatives concerning center loss can be found in [245]. These deriva-
tions show that MCNs can be learned with the BP algorithm. The quantization process leads
to a new loss function via a simple projection function, which never affects the convergence
of MCNs. We describe our algorithm in Algorithm 1.
3.4.4
Parameters Evaluation
θ and λ: There are θ and λ in Eq. 3.18, which are related to the filter loss and center loss.
The effect of parameters θ and λ is evaluated in CIFAR-10 for a 20-layer MCN with width
16-16-32-64, the architecture detail of which can be found in [281] and is also shown in
Fig. 3.6. The Adadelta optimization algorithm [282] is used during the training process,